Semantic Atomicity and Multilinguality in the Medical Domain: Design Considerations for the MorphoSaurus Subword Lexicon
نویسندگان
چکیده
We present the lexico-semantic foundations underlying a multilingual lexicon the entries of which are constituted by so-called subwords. These subwords reflect semantic atomicity constraints in the medical domain which diverge from canonical lexicological understanding in NLP. We focus here on criteria to identify and delimit reasonable subword units, to group them into functionally adequate synonymy classes and to relate them by two types of lexical relations. The lexicon we implemented on the basis of these considerations forms the lexical backbone for MORPHOSAURUS, a cross-language document retrieval engine for the medical domain.
منابع مشابه
The MORPHOSAURUS Medical Subword Lexicon – Lexicographic and Semantic Aspects
For technical sublanguages such as the medical one, document indexing based on lexical entities at a subword level has proved useful. However, it still remains challenging to identify and to delimit the meaningful lexical entities, as well as to group them in synonymy classes. We present a lexicographic and semantic foundation underlying the multilingual MORPHOSAURUS lexicon. Resumo. Para lingu...
متن کاملA Medical Multilingual Information Retrieval
The Web is full of documents and resources. Users employ different strategies to find information they need: by browsing, using search engines, by following existing categories in a Web catalog. For technical sublanguages such as the medical one, document indexing based on lexical entities at a subword level has proved useful. However, it still remains challenging to identify and to delimit the...
متن کاملMorphosaurus in ImageCLEF 2006: The Effect of Subwords on Biomedical IR
We here describe the subword approach we used in the 2006 ImageCLEF Medical Image Retrieval task. It is based on the assupmtion that neither fully inflected nor automatically stemmed words constitute the appropriate granularity for lexicalized content description. We therefore introduce subwords as morphologically meaningful word units. Subwords are organized in language specific lexica that we...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کامل